INSPIRE: Intensity and spatial information-based deformable image registration

We present INSPIRE, a top-performing general-purpose method for deformable image registration. INSPIRE brings distance measures which combine intensity and spatial information into an elastic B-splines-based transformation model and incorporates an inverse inconsistency penalization supporting symmetric registration performance. We introduce several theoretical and algorithmic solutions which provide high computational efficiency and thereby applicability of the proposed framework in a wide range of real scenarios. We show that INSPIRE delivers highly accurate, as well as stable and robust registration results. We evaluate the method on a 2D dataset created from retinal images, characterized by presence of networks of thin structures. Here INSPIRE exhibits excellent performance, substantially outperforming the widely used reference methods. We also evaluate INSPIRE on the Fundus Image Registration Dataset (FIRE), which consists of 134 pairs of separately acquired retinal images. INSPIRE exhibits excellent performance on the FIRE dataset, substantially outperforming several domain-specific methods. We also evaluate the method on four benchmark datasets of 3D magnetic resonance images of brains, for a total of 2088 pairwise registrations. A comparison with 17 other state-of-the-art methods reveals that INSPIRE provides the best overall performance. Code is available at github.com/MIDA-group/inspire.


Introduction
Deformable image registration is the process of finding dense correspondences between images which maximize the affinity of co-occuring structures while simultaneously preserving spatial coherence [1][2][3]. Numerous applications in science and technology require deformable image registration. Medical and biomedical imaging scenarios often involve acquisition of images of the same specimen from multiple views or at different times, requiring image registration for information fusion [4]. Inter-subject and atlas-to-subject correspondence maps are commonly used for comparative studies as well as for atlas-based segmentation [5]. Tracking of objects in computer vision often relies on the ability to find deformable correspondence maps [6].
To reach top performance on a particular task, an existing method for deformable image registration, see e.g., [1,6,7], must typically be tuned or modified to fit the specific characteristics of the application, using carefully designed feature extraction and selection as well as task optimized pre-processing [8]. Intensity-based registration, involving maximization of a similarity measure using local optimization, is one of the main approaches towards general-purpose deformable image registration [9][10][11], in particular in the medical and biomedical contexts.
Learning-based methods have become more prominent in recent years, aiming to replace manual (engineered) tuning to the specific task with learning based such. The most successful of these methods involve training a neural network to perform the registration task in only one or a few prediction steps. Once trained, they typically offer reduced run-times [7,[12][13][14], however requiring representative training data and in many cases delivering lower registration accuracy than the iterative approaches [7]. We present INSPIRE, a novel state-of-the-art deformable registration method based on minimization of a distance measure which combines intensity and spatial information [15], modelling the deformations as cubic B-spline transformations. INSPIRE is symmetric, intensity interpolation-free, and robust to large displacements of image structures. To further increase utility of the method, we propose an efficient Monte Carlo algorithm for estimating the distances and gradients which substantially reduces the memory required and increases applicability of the framework to big (image) data. We also propose a gradient-weighted sampling scheme to further improve the efficiency of stochastic subsampled optimization. An open-source implementation is available at github.com/MIDA-group/inspire. Through its efficient combination of stochastic subsampled optimization and cubic B-Spline transformations, INSPIRE achieves a low run-time, while simultaneously achieving high accuracy.
We conduct three major experiments comparing the performance of INSPIRE with both intensity-based methods and widely used learning-based methods: (i) Registration to recover known transformations of 2D retinal images, as illustrated in Fig 1. Here INSPIRE substantially outperforms both the learning-based method VoxelMorph [13] and intensity-based methods [10,11]; (ii) Registration of 134 real 2D retinal image pairs of the Fundus Image Registration Dataset (FIRE), where the method is compared against five other methods, including the better the alignment of the vessel networks, the less magenta/cyan colour visible in the image): (a) Initial deformation, and the results of registration using (b) elastix [11], (c) ANTs SyN [10], and (d) INSPIRE. Bottom row: The respective (Ground Truth) segmented vessel networks in the observed images in the top row (only used for evaluation, not for registration), with their corresponding Jaccard index (J). We observe that elastix (b, f) registered reasonably well the upper part of the vessel network, but performed poorly on the lower region. ANTs SyN (c, g) registered some parts of the image well, but failed to obtain an accurate overall alignment, and produced severe distortions and artefacts. INSPIRE produced a very accurate alignment, with barely visible errors. Performance on the whole Retinal dataset is summarized in Fig 4. https://doi.org/10.1371/journal.pone.0282432.g001  [16], where the method is compared against 17 other methods (including learning-based methods Quicksilver [14] and Voxel-Morph [13]) on 4 datasets-LPBA40, IBSR18, CUMC12, MGH10-for a total of 2088 pairwise registrations. On IBSR18 and MGH10, INSPIRE is the best performing method, and on LPBA40 and CUMC12, INSPIRE reaches the second best performance. The method is fast, easy to use, utilizes parallel processing, and requires small amount of auxiliary memory.

Contributions, related work, preliminaries
We start with summarizing the main contributions of this paper, while also relating it to the relevant work in the field. Then we introduce the main concepts and notation that this work relies on.

Contributions
INSPIRE extends the affine image registration framework [15] to deformable image registration, a substantially more challenging (and ill-posed) task. The main contributions of this paper are: (i) The proposed objective function, based on the α-AMD distance measure [15], while also incorporating an inverse inconsistency regularization term delivering symmetric registration performance also for non-invertible transformation models; (ii) A novel stochastic point-sampling method based on Gaussian gradient magnitudes that increases the performance of the optimization process; (iii) An intensity interpolation-free distance (and gradient) estimation method based on Monte Carlo integration that overcomes the limitations (high memory requirements, and requirement to quantize image intensities) of the discretization approach of [15] and enables registration of gigapixel images; (iv) A derivative normalization scheme that stabilizes the optimization and simplifies the tuning of hyperparameters; (v) A created open access image dataset of thin vessel structures, suitable for quantitative evaluation of registration methods.

Most relevant related work
There exist a large number of intensity-based deformable image registration methods [3,7], out of which some are closely related to this study. In addition to methods based on B-splines [9,17], we particularly mention two methods based on non-parametric dense displacement fields [10,18]. Thirion's Demons [18] is a non-parametric asymmetric diffusion-based method for image registration, where a (dense) displacement field is updated via composition, using smoothing of the update field, as well as of the total field, as regularization. A variant of the Demons algorithm that addresses the lack of symmetry of the original version is the Diffeomorphic Demons method [19]. Another example of a symmetric non-parametric method is Symmetric Normalization (SyN) [10], which aims at finding two geodesic paths, one from each space (of the two images to be registered), to a common 'half-way' space. A cost term is used to maximize similarity between the two input images warped into this common space.
Among most recent work, much research has been dedicated to Deep Learning for deformable image registration. Two relevant approaches aim for general applicability: Quicksilver [14], trained to emulate an iterative method, operates on patches using a convolutional neural (encoder-decoder) network to predict a deformation model based on image appearance, imitating the "ground-truth" non-learning-based method [20] with a few prediction steps; VoxelMorph [12,13,21] is a successful example of another popular approach which uses unsupervised training of a CNN to directly predict the transformation, given the full images/ volumes as input. The training of the network is done with a similarity measure as objective function.

Preliminaries and notation
2.3.1 Intensity-based deformable image registration. Given a distance measure d between images and a set of valid transformations O, deformable intensity-based registration of two images, A (floating) and B (reference), can be formulated as the regularized optimization problem,T where R denotes a regularization functional on T, modelling prior knowledge (or preference) -e.g., smoothness or invertibility-ofT (if not guaranteed by the choice of O).

Images as fuzzy sets.
The theory of fuzzy sets [22] provides a framework where gray-scale images are conveniently represented as spatial fuzzy sets. A fuzzy set S on a reference set X S is a set of ordered pairs, S ¼ fðx; m S ðxÞÞ : x 2 X S g. Representing images, the membership function m S : X S ! ½0; 1� assigns a value (intensity) to each pixel in the image domain X S . We denote the dimensionality of the image domain X S by n and, for rectangular domains, we denote the domain size by N 2 Z n (a vector of side lengths of the nD-rectangle along each dimension). The complement of a fuzzy set S is � S ¼ fðx; 1 À m S ðxÞÞ : x 2 X S g. An α-cut of a fuzzy set S is a crisp set a S ¼ fx 2 X S : m S ðxÞ � ag, i.e., a thresholded image. The height hðpÞ of a fuzzy point p (a single-element fuzzy set) is equal to its intensity. For details see [15,22,23].

Fuzzy point-to-set and set-to-set distances.
A bidirectional fuzzy point-to-set distance based on α-cuts [23,24], between fuzzy point p and set S is Given fuzzy set A on reference set X A � R n , fuzzy set B on X B � R n , a weight function w A : X A ! R �0 , and a crisp subset (mask) M B � R n , the Asymmetric average minimal distance for image registration [15] from A to B, parameterized by a transformation T : whereX ¼ fx : x 2 X A^T ðxÞ 2 M B g.

B-Splines for deformable registration.
The free form deformation (FFD) model [9] is a parametric non-rigid deformation model, based on cubic B-spline transformations, that is widely used in image registration [11,25].
Registration based on (5) is compatible with two primary multi-scale strategies (which can be combined) for increasing both robustness and accuracy: (i) Gaussian resolution pyramids, and (ii) multi-resolution B-spline grids [9].

Symmetric transformations and inverse inconsistency.
Symmetric registration frameworks, where the roles of the reference and floating images are interchangeable, generally exhibit improved robustness and performance as well as reduced impact of interpolation. Some methods are symmetric by construction [10,19], although, the involved iterative estimation of dense displacement fields make them slow. However, not all methods nor all transformation models (cubic B-splines, e.g.) are invertible (or closed under inversion). Given transformation T AB , which denotes a transformation that aligns image A with image B, and transformation T BA that aligns image B with image A, T AB and T BA may not be each other's inverses. In such cases, where a registration approach and transformation model does not provide invertibility by construction, we may still obtain many of the benefits of an invertible model through a regularization scheme.
The inverse inconsistency measure [26] quantifies the deviation of a pair of transformations T AB and T BA from being each other's inverses, and is, for a point x (in the space of image A), given by The total IIC, given by, e.g., summation or averaging, can be used as a cost term to promote invertibility, an idea introduced in [26] which has remained a topic of interest in image registration [17,27,28]. 2.3.6 Stochastic sampling of points. The efficiency of many intensity-based image registration methods can be dramatically increased by use of stochastic subsampling [15]. Quasirandom (or stratified) sampling shows to be a preferable choice in stochastic subsampling, compared to, e.g., uniform subsampling [29]. One well performing approach is an n-dimensional Kronecker sequence, where the i-th sample x(i) = (x 1 (i), . . . x n (i)) is given by for suitable (irrational) choices of a 1 , . . ., a n . It has been observed [30] that generalized golden ratio numbers a 1 ¼ � À 1 n ; . . . ; a n ¼ � À n n are suitable choices. They are given by the following nested radical recurrence

Intensity and spatial information-based deformable image registration
We introduce INSPIRE-a novel method for performing deformable image registration by directly solving the optimization problem defined by (1) using local optimization. To this end, we define an objective function consisting of a distance term which combines intensity and spatial information [15,23], and a regularization term which promotes inverse consistency [26], while we choose a set of transformations (O) based on cubic B-splines [9]. INSPIRE addresses registration by solving a sequence of above described optimization problems (which we refer to as stages) in a coarse-to-fine manner, with images of varying resolution and smoothness, and utilizing transformations with varying number of control-points [9]. Each optimization problem is solved using local search with a gradient-based optimization procedure, starting from an interpolated version of the transformation found at the previous stage [31,32].

An objective function for deformable image registration
We first define our novel objective function for image registration. It builds on distance measures of our framework for affine symmetric registration [15], but excludes the assumption of existence of (or access to) an inverse of the transformation. Aiming for symmetric image registration (even) in the absence of invertible transformations, we incorporate, as regularization, a soft constraint based on the IIC (6), inspired by [17,26,27]. The proposed objective function is directly compatible with cubic B-spline transformations or other spline variations, as well as registration based on non-parametric displacement fields. Let a fuzzy set (image) A on reference set (domain) X A , fuzzy set B on X B , weight functions w A : X A ! R �0 and w B : X B ! R �0 , crisp subsets (masks) M A ; M B � R n , crisp subsets (sets of sample points) P A � X A and P B � X B , and regularization parameter l 2 R �0 , be given.
We define the average asymmetric weighted inverse inconsistency (AW-IIC) of a transformation pair T AB , T BA as Combining (3) and (9), we define the core of our proposed registration framework-an objective function for deformable image registration with inverse inconsistency regularization, for a transformation pair T AB , T BA , as Using subsets P A � X A and P B � X B created by random sampling in (10) enables using optimization methods that rely on stochastic sampling, such as stochastic gradient descent. This is a strategy to speed up parametric (B-spline) models that do not require dense sampling.
The objective function (10) suitably combines the squared norm used in (6) with the (nonsquared) Euclidean point-to-point distance inbuilt in (3): for small inverse inconsistencies the distance term dominates while for larger inverse inconsistencies the IIC term gradually overtakes.

Gradient-based stochastic optimization
To minimize (10), we employ stochastic gradient descent with momentum (SGDM), with subsets (P A and P B ), created by sampling, when computing the objective function and its derivatives. The gradient of the first part of (10), related to distance between the images, is given by the estimation method presented in Sec. 3.3, whereas the derivative of (6) (the second part of (10)) follows by differentiation of (5) and the chain-rule.
In the following subsections, we introduce a novel point-sampling method and a weightnormalization scheme for improving the performance of the stochastic optimization.

Novel point-sampling method.
We observe that large smooth corresponding image regions tend to overlap early in the registration process, yielding very small gradients for the subsequent iterations.
Gradient-weighted sampling of points. We propose to sample the points (in an image domain X) with non-uniform probabilities given by the normalized spatial Gaussian gradient magnitude where X is the random point variable and GM(y;σ GM , t GM ) denotes the Gaussian gradient magnitude, with smoothing σ GM and where values below t GM are set to 0. If no points exceed the gradient magnitude threshold, (11) is undefined, in which case we use uniform sampling instead. The parameter σ GM of the Gaussian-derivative allows tuning to the size of structures and the desired width of the region of mass around edges. An example of sampling based on (11), applied to a retinal image, is shown in Fig 2. In comparison to previous approaches [33], this sampling does not depend on the transformation; each image is sampled independent of the other and the probabilities thus remain unchanged for all iterations corresponding to a given resolution level in the coarse-to-fine pyramid optimization, saving computational time. Furthermore, [33] is asymmetric, assigning distinct roles to the two images, unlike the proposed sampling method.
Combination of sampling schemes. Sampling from p Δ (11) is more computationally demanding than uniform sampling (p U ). Furthermore, it omits uniform regions entirely, which can result in failed registration or locally high inverse inconsistency, if used exclusively. We therefore propose a sampling method which combines the two methods to obtain the benefits of both p Δ and p U without their respective drawbacks. The suggested mixture probability model (parameterized by m) is where p Δ is the gradient-weighted sampling method (11) and p U is the uniform sampling method. First a sampler is selected at random, and then a point is sampled from the chosen sampler.

Derivative scaling.
The combination of the SGDM optimizer and cubic B-spline transformations poses a particular challenge. The magnitudes of the derivatives related to a given transformation parameter depend heavily on grid coarseness and how many (and which) points are sampled inside the parameter's local support. To both decrease the impact of the random sampling on the magnitude of the derivatives and to simplify tuning of the stepsize (by making it less dependent of grid coarseness), we propose a derivative scaling scheme for the partial derivatives of the objective function J.
Let T i denote the i-th parameter of the transformation T, i.e, one coordinate of one of the control-points in F (5). The partial derivatives w.r.t. T i are scaled in the SGDM optimization by the factor 1 g i AB þε , where g i AB expresses the total impact of the parameter on the transformation of the points in P A and P B (in (10)), and is given by whereas ε is a small constant (we use ε = 0.01).

Estimation of α-cut distances and their spatial derivatives by Monte Carlo sampling
Computation of the objective function (10) and its derivatives requires estimation of (3), which is based on the point-to-set distance measure (2). The family of methods proposed in [15,23] for performing this estimation on rectangular (grid) domains require quantization of (image) intensities into a discrete number of levels, followed by computation of distance transforms on each intensity level. This approach has two main limitations: (i) loss of information due to (coarse) quantization, (ii) substantial run-time and memory usage for computation and storage of the distance transforms. Here we propose a novel method for estimating (2) with an approximation based on Monte Carlo integration. Given N α randomly sampled α-cuts, for a 1 ; . . . ; a N a 2 ½0; 1�, the estimator e dðp; SÞ � đ a ðp; SÞ is defined as y (1) , y (2) , R (1) , R (2) , k SPLIT-RECT(y, R; s) 9: if [P 1 ] k � [y (2) ] k then 10:

7:
if Q n j¼1 R j ¼ 1 then

8:
t i m S ðyÞ 9: else 10: y (1) , y (2) , R (1) , R (2) , k SPLIT-RECT(y, R;s) 11: BUILD À TREE À RECðt; LEFTðiÞ; y ð1Þ ; R ð1Þ ; SÞ 12: BUILD À TREE À RECðt; RIGHTðiÞ; y ð2Þ ; R ð2Þ ; SÞ 13: end if 15: end procedure Algorithm 3 constructs an augmented KD-tree [34], tðSÞ, that corresponds to the input image S. Within each node of tðSÞ we store the maximum intensity of that sub-tree. This enables an efficient distance search, where sub-trees which are empty after α-cutting can be pruned. The discrete rectangular domain enables storing the node attributes compactly in a single array. The left and right sub-trees of a node indexed by i have indices 2i and 2i + 1 respectively, where the root has index 1.
Sampling and multi-sample search. An initial step of Algorithm 1 is to sample α-cuts for which to compute the distance. We propose to use the quasi-random sampling (7), with n = 1. Algorithm 1 can be implemented to perform a joint search for the closest point distances for all the N α sampled α-cuts in parallel, yielding a substantially lower computational time. To simplify the pseudo-code, multi-sample search is omitted from the description (but is used in our provided implementation).
Sub-pixel distances. To enable evaluation of (14) everywhere in the image space, we compute the distance from all nearby grid-points of the point of interest p, followed by distance interpolation. We use bilinear/trilinear interpolation to compute the sub-pixel distance, and the discrete gradient approximation evaluated in the mid-point (to avoid vanishing derivatives at saddle-points of the interpolation surface).
Space subdivision and lower distance bound. To facilitate efficient search for closest point distances in Alg. 1, using the (implicit) KD-tree data structure constructed by using Alg. 3, we require systematic methods for space subdivision and for computing lower distance bounds from points to rectangles corresponding to nodes in the tree.
Given an nD-rectangle of size R = (R 1 , . . ., R n ) spels (nD-pixels), with y denoting its corner point with minimal coordinates, and s = (s 1 , . . ., s k , . . ., s n ) the spel spacing along each dimension, let SPLIT-RECT(y, R;s) be a function that splits a rectangle in half (up to integer precision), into two integer-sized sub-rectangles, along the dimension given by arg max k2f1...ng s k ðR k À 1Þ, breaking ties systematically.
As a lower bound for the distance dðp; a SÞ, we use the Euclidean distance from the point p to the nearest spel inside an nD-rectangle (characterized by y, R and s) with maximum intensity � α, given by d R ðp; y; R; sÞ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P n i¼1 maxð0; y i À p i ; p i À ðy i þ s i ðR i À 1ÞÞÞ To further speed up the search, we may substitute d R withd R , which relaxes the bound by a factor of β for distance bounds which exceed a provided threshold d t . This may significantly reduce the number of nodes that need to be searched in exchange for potential over-estimation

Preprocessing and implementation
INSPIRE uses Gaussian resolution pyramids and a coarse-to-fine pyramid B-spline schedule with increasing number of control points. At each level, a downsampling factor, a parameter σ controlling the width of the Gaussian filter used to smoothen the image, and the number of control-points (along each dimension) are to be selected. Furthermore, at each level, (optional) pre-processing steps of (robust) normalization (to [0, 1]) using the q-percentile of image intensities m S (similar as in [15]), and histogram equalization, are used to remove differences in value range (and outliers), and homogenize the global contrast of the images to be registered. INSPIRE is implemented in the C++ language, as an extension of the Insight Toolkit (ITK) [35].

Performance analysis
To validate the practical relevance of the proposed method, we present results of an empirical study of the performance of INSPIRE, compared with several state-of-the-art methods, when The red circle is the fuzzy point p, from which the distance is computed, and the four nearby star ( � ) markers indicate the nearest grid points (P). The two crosses (×) represent points of S that surpass the sampled intensity threshold α. The gray rectangle shows the region (y, R) corresponding to the current node i in the KD-tree. The (lengths of the) dashed lines (blue) represent the distances from each of the nearest grid-points to the current closest point (the upper ×). The (lengths of the) solid lines (black) represent the lower bound of the distance to any point within the rectangle. In the example, the lower bound is smaller than at least one of the current lowest distances, and the rectangle contains at least one object point, and thus the sub-tree will be searched. The (lengths of the) dash-dotted (red) lines represent the distances from the grid-points for which a closer point was found by searching within the rectangle.
https://doi.org/10.1371/journal.pone.0282432.g003 used to solve image registration tasks in 2D and 3D. The data and several scripts to generate the plots for the paper can be located in [36].

Synthetic evaluation on retinal images
Thin structures, such as vessel networks appearing in medical images, present a challenge for registration methods. To assess the performance of the proposed method on deformable registration of images containing thin structures with varying thickness and contrast, we construct a synthetic evaluation task, using real medical (retinal) images. Available annotated groundtruth masks of the vessel networks enable quantitative performance assessment based on comparison of the degree of mask overlap after registration.
Most existing deformable registration methods rely on similarity measures that are based on differences in intensity of overlapping points. Such similarity measures are potentially ill-suited for thin structures, where large transformations are typically handled by use of Gaussian pyramids with a high degree of smoothing, a strategy that risks erasing the structures of interest. INSPIRE, on the other hand, can deal with large transformations without much smoothing.

Construction of the dataset.
We create a new dataset suitable for evaluation of image registration methods, with focus on thin structures of varying thickness and contrast. Starting from the High Resolution Fundus Image Database [37], which consists of 42 images of size 3504 × 2336, we perform the following sequence of processing steps: • The images are converted from RGB to grayscale; • Background variation is removed with a rolling ball of radius 20 pixels, to remove low frequency features which may be exploited by a registration method; • The images are padded (with 350 px on each side); • The images are deformed in a coarse-to-fine sequence of random elastic transformations (Bsplines with 7, 14, 24, 48 control points, with random perturbations to each parameter of (−250, 250), (−50, 50), (−25, 25), (−15, 15) at each level respectively, creating 42 pairs of "floating" (deformed) and "reference" (non-deformed) images.
The last two image pairs, 41 and 42, are designated training images. The created dataset is accessible on the Zenodo platform [38]. An example image pair is shown in Fig 1(a).

Experiment setup.
In this study, INSPIRE is benchmarked against widely used general purpose registration methods, the parametric registration toolkit elastix [11], and the non-parametric ANTs SyN [16]. We also evaluated a feature-based method based on SIFT-features [39] (using a variety of settings), but the obtained feature point-sets were consistently too noisy to be useful, so we excluded that approach from the full study. Finally, we also compare with the recent learning-based method VoxelMorph, both without (VM) [13,21] and with (100 steps of) instance optimization (VM-IO).

Method configuration.
Parameter tuning of all the methods were performed on the training image pairs 41 and 42. The main parameters tuned for all the methods are: number of pyramid levels and step-lengths (or maximum gradient magnitude for elastix); additionally for ANTs SyN: field regularization, and for INSPIRE: inverse inconsistency factor λ and gradient-weighted sampling parameters m and σ GM . The parameters chosen for INSPIRE are listed in Table 1.
For elastix, we used mean square difference as distance measure, a 6 level Guassian pyramid, a B-spline transformation model with final grid spacing of 50 pixels, adaptive stochastic optimizer, 3000 iterations per level and 40000 random spatial samples per iteration.
For VoxelMorph, we configured it using the default settings with MSD as the distance measure, which is appropriate due to the the registration task being monomodal, and trained until convergence which occurred after 20000 iterations. Both the plain version (denoted VM) [21], as well as the version of the method that includes (100 steps of) instance optimization (denoted VM-IO) [13] are included. Since the images did not fit into the GPU, we performed both training and inference on sub-patches of 512 × 512 pixels with 50% overlap in each direction (yielding a 1024 × 1024 image), such that no deformation is larger than what can fit into this overlap. Furthermore, we split the dataset in two folds, where fold-1 comprises the first 19 images and fold-2 comprises the last 19 images, and select the reference image for half of the images in each fold and the floating image for the other half.    due to the lack of overlap between the corresponding structures in the reference and floating images and therefore correspondences need to be established over large distances; even though the corresponding structures are within each other's receptive field (for the chosen U-net architecture), the registration is unsuccessful.
As an ablation study, we also observed INSPIRE with λ = 0, which corresponds to disabling the inverse inconsistency regularization, as well as INSPIRE with m = 0, which corresponds to turning off GW sampling. Both features show to contribute to improved performance and robustness.
In addition to the quantitative results presented here, we include an illustrative animation of an image registration process with INSPIRE on image 4 in the dataset (shown in Fig 1), enabling a qualitative assessment of the method, included as supplementary material in S1 File.

Accuracy and number of iterations.
In image registration, there is typically a tradeoff between the robustness and quality of the results, and the time required to obtain them. We have quantified this trade-off observing the registration of a subset (first 4) of the observed retinal images. At every 10-th iteration of the registration we output the registered ground-truth mask and compute the Jaccard index. The resulting graph, shown in Fig 5, indicates that if speed is of a higher priority than accuracy, there is a lot of room for reducing the number of iterations. See also Sec. 4.4.

Image registration of real retinal image pairs
Having observed excellent performance on the synthetic retinal image dataset, we proceed to evaluate the proposed method on real retinal image pairs, and use the open dataset Fundus Image Registration Dataset (FIRE) [40]. The FIRE dataset consists of 134 retinal colour image pairs to be registered, each pair consisting of two distinct image acquisitions of the same retina.

Preprocessing, rigid registration and method configuration.
Before we can perform deformable registration of the images with INSPIRE, a few preprocessing steps are required. • (i) We remove the colour from the images through averaging of the three colour channels; • (ii) We remove the bias field by subtracting a blurred image (Gaussian blur with σ = 10) from the original image; • (iii) We perform a robust normalization of the images (only considering the contents within the mask) with percentiles chosen as (0.01 and 0.75) which eliminates some bright artifacts which can disturb the registration.
After the preprocessing we perform an initial rigid registration to find a coarse alignment before performing the deformable registration. Given the small overlaps in the P subset, we perform a global rigid alignment by computing the d aAMD (3) for all discrete translations, which can be computed efficiently using the Fast Fourier Transform (FFT) algorithm [23]. This step is repeated for 32 rotation angles [−10, 10] discretized in a grid, and implementing the use of masks in a similar way to the process described in [41]. The rigid registration is performed on 4×-downsampled images (along each dimension) for efficiency reasons, and with the requirement of at least 40% overlap of the masks to filter out rigid transformations where the overlapping region contains no vessel-structures. The full preprocessing and rigid registration implementation is included in [36].
The parameters chosen for INSPIRE (the deformable registration) are listed in Table 2.

Experiment setup.
We apply the standardized evaluation procedure related to the FIRE dataset [40] by comparing the mean Euclidean distance between ground-truth landmarks provided by an expert annotator in the reference space after registration. Finally, we examine the results in terms of success-rate, i.e., the fraction of successful registrations out of all performed registrations, as a function of a success-threshold of the mean Euclidean landmark distance.

Results.
We present the results of the experiment in Fig 6. INSPIRE outperforms the domain-specific methods substantially, being both more robust as well as more accurate in comparison with the considered methods. The proposed approach failed on the rigid registration for a single image pair (from the P subset) which had a very small overlap.

Deformable image registration of 3D MR images of brains
A common task used to assess the performance of deformable image registration methods is registration of manually labelled 3D MR images of brains, using the target overlap (measured in the reference image space) of each distinct label as the quality measure [5,14].

Experiment setup.
We consider four public datasets in this evaluation, LPBA40 [47], IBSR18, CUMC12, and MGH10, consisting of 40, 18, 12 and 10 images respectively. Within each dataset, each image is registered to all other, giving a total of 2088 pairwise registrations. We follow the evaluation protocol of [5], thereby enabling direct comparison with 17 state-of-the-art methods, including two deep learning-based registration methods, Quicksilver [14] and VoxelMorph [13]. Following [5], each image is first affinely registered to the ICBM MNI152 non-linear atlas [48] using NiftyReg [25], such that each image is of size 229 × 193 × 193, except for LPBA40 where the images are of size 229 × 193 × 229. Table 3, is based on the configuration used in Sec. 4.1, with a few changes on account of the smaller image diameter to (i) make the optimization more stable and (ii) speed up the method.  Histogram equalization was enabled due to the difference in contrast of the different volumes, which was not sufficiently addressed by normalization. Based on prior experience [15], we omit downsampling and smoothing of the (already fairly low-resolution) 3D MR brain volumes. Configuration of the reference methods are detailed in [5,14]. We also included the recent and widely used unsupervised learning-based method VoxelMorph [13], which is configured using the default settings with MSD as the distance measure, which performed well in the original study [13] on similar data, and trained until convergence-which occurred after 20000 iterations. Both the plain version (denoted VM) [21], as well as the version of the method that includes (100 steps of) instance optimization (denoted VM-IO) [13] are included in our evaluation.

Results.
The performance of INSPIRE, in comparison to the performance of 17 other deformable registration methods, on the 4 considered datasets of MR brain images [5,14], is summarized in Fig 7. FLIRT denotes the output of an established affine registration method included for reference. All the registrations start from pre-processed images obtained (together with tables of results) through correspondence with the authors of [14]. The results are presented as box-and-whiskers plots which visualize the shape of the full empirical distribution of the results over all registrations for each method, with one plot per dataset. The top 7 ranked methods for each dataset w.r.t. mean average target overlap are shown in Table 4. INSPIRE is the top ranked method on the IBSR18 and MGH10 datasets, and the second best on the LPBA40 and CUMC12 datasets. Deformable image registration is a challenging task [5,13], where some commonly used methods can require several minutes or even hours of processing time; methods using direct optimization of (dense) displacement fields tend to have especially high run-time, while model-based methods using B-Spline transformations are substantially faster, and learning-based methods tend to be the fastest type of methods.
Here we evaluate the empirical run-time of INSPIRE, elastix, ANTs SyN, as well as VoxelMorph.
We run INSPIRE and VoxelMorph, with 4 repetitions, on (i) a retinal image pair (image pair 1), and (ii) a brain image pair (MGH10, images 1 and 2), and measure the mean and stddev of the run-time, when using 14 worker threads on a 3.3 GHz Intel(R) Core(TM) i9-9940X processor with 14 physical cores, 19.25 MB cache, 96 GB RAM, and a NVidia GeForce RTX 2080 Ti GPU (used by VoxelMorph).

Results.
The run-times for both the retinal and the brain data are shown in Table 5. As can be seen for the comparison on the retinal data, INSPIRE exhibits efficient run-time in comparison to other non learning-based methods while achieving high accuracy. VoxelMorph is substantially faster, while on the other hand, achieving very much lower (substandard) accuracy on the retinal images and lower accuracy on the brain images. The time to train a Voxel-Morph model for the retinal dataset was 3062s and the time to train a VoxelMorph model for the brain data was 16000s.

Ethics statement
The High Resolution Fundus Image Database, which the synthetically deformed dataset is based on, consists of a publicly available and anonymized dataset of retinal images. This is a retrospective study and only anonymized data is used, thus ethical approval is not required. The FIRE dataset consists of 134 images from 39 male and female patients who provided informed consent for the analysis as stated [40]: "Written informed consent was obtained before data acquisition and processing." Since we only perform a retrospective study on these images for the stated purpose, ethical approval was not required.
The 3D study has been performed retrospectively on four datasets of skull-stripped and anonymized brain volume images used soley to evaluate performance of registration methods, a study which requires no ethical approval.

Discussion
INSPIRE is highly configurable, and therefore includes a number of hyper-parameters, just like most registration frameworks; new tasks may require some tuning to reach optimal performance. We observe that the parameters which are main candidates for tuning are the resolution pyramid settings (number of levels, smoothing σ, subsampling factor, number of control-points), normalization and histogram equalization (depending on image intensity distributions), and optimizer step-size (depending on the size of the images and of the deformations). The remaining parameters usually do not require careful tuning.
INSPIRE is a mono-modal registration method; it is based on a distance measure that assumes similar corresponding intensities for good performance. This work has, accordingly, only considered applications where differences in the distribution of intensities could be mitigated using normalization and histogram equalization. INSPIRE supports multi-channel images by marginal computation (and aggregation) of the distances and gradients (per channel separately), optionally utilizing [49].

Conclusion
We have presented a new deformable image registration framework, INSPIRE. By integrating an inverse inconsistency constraint into an existing asymmetric distance measure, we formulated an objective function which enables finding (approximately) invertible transformations between images, even when the chosen transformation model is not invertible or closed under inversion. By introducing a novel method for estimating the objective function and its derivatives, based on Monte Carlo sampling, even very large images can be registered without requiring much memory, and without quantization of the image intensities. A proposed gradientweighted sampling scheme enables efficient sampling, leading to increased accuracy for a given computational cost.
INSPIRE exhibits excellent performance on deformable registration of synthetically deformed retinal images, where the structures of interest are thin and sparse, while other compared methods fail to solve the task satisfactorily. INSPIRE dramatically outperforms recent learning-based method VoxelMorph on the retinal image registration task. In a further study using the FIRE dataset, consisting of the registration of a set of 134 pairs of retinal images, we observe that INSPIRE exhibits outstanding performance, substantially outperforming all evaluated methods, consisting of several which are designed specifically for the application domain. Our evaluation of INSPIRE on a benchmark of pairwise inter-subject registration on 4 datasets, enables its direct comparison with 17 existing methods. INSPIRE demonstrates the overall best performance, ranking first on two datasets and second on the other two. INSPIRE outperforms the considered learning-based approaches on all 4 datasets, without requiring any training (data).
Supporting information S1 File. Illustrative animation of INSPIRE applied to a retinal image pair. Magenta/cyan representations of the intensity images (the better the alignment, the less magenta/cyan colour is visible in the image), binary vessel masks (only for evaluation purposes, not used for registration), and the deformation field displayed as a chessboard pattern are included, as well as the objective function J of the registration plotted as a function of the iteration number. (GIF)